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COMPUTING MINIMAL POLYNOMIALS OF MATRICES 
MAX NEUNHOFFER and CHERYL E. PRAEGER 

Abstract 

We present and analyse a Monte-Carlo algorithm to com- 
pute the minimal polynomial of an n x n matrix over a finite 
field that requires 0{n'^) field operations and 0{n) random 
vectors, and is well suited for successful practical implementa- 
tion. The algorithm, and its complexity analysis, use standard 
algorithms for polynomial and matrix operations. We compare 
features of the algorithm with several other algorithms in the 
literature. In addition we present a deterministic verification 
procedure which is similarly efficient in most cases but has a 
worst-case complexity of 0{n^). Finally, we report the results 
of practical experiments with an implementation of our algo- 
rithms in comparison with the current algorithms in the GAP 
library. 

1 . Introduction 



et F be a finite field and M S F"'*" a matrix. This paper presents and analyses 
^Monte Carlo algorithm to compute the minimal polynomial of M, that is, the 
,^onic polynomial /i G F[a;] of least degree, such that fJ,{M) = 0. Determining the 
"^inimal polynomial is one of the fundamental computational problems for matrices 
S^d has a wide range of applications. As well as revealing information about the 
L_Ht)benius normal form of M, the minimal polynomial also elucidates the structure 
of F" viewed as F[a::]-module, where x acts by multiplication with M . In addition 
Qie order of M modulo scalars is often found by first determining the minimal 
^.l^lynomial. Apart from these applications it has important practical utility, for 
^s^ample in the context of the matrix group recognition project [10]. 
0^ For these and other reasons a number of algorithms to determine the minimal 
^blynomial may be found in the literature. We discuss some of them below. Our 
l/pyimary objective was to provide a simple and practical algorithm that could be 
^Siplemented easily and would work well over small finite fields. In particular we 
r3td not want to produce matrices with entries in larger fields or polynomial rings 
intermediate results, and we preferred to restrict ourselves to using only row 
leperations (rather than a combination of row and column operations). In addi- 
totjon we wished to use standard field and polynomial arithmetic, and we wished to 
jg^ve an explicit worst-case upper bound for the number of elementary field opera- 
®i)ns needed, and not only an asymptotic complexity statement. Our Monte Carlo 
algorithm adheres to these requirements for matrices over fields F^ of order q. 
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Theorem 1.1. For a given matrix M e F^^" and a positive real number e < 
1/2, Algorithm 5 computes the minimal polynomial of M with probability at least 
1 — e. For sufficiently large n and fixed e, the number of elementary field operations 
required is less than 7n^ plus the costs of factorising a degree n polynomial overFq 
and constructing at most n random vectors in F". 

Our algorithm to compute the minimal polynomial first computes the charac- 
teristic polynomial in a standard way by spinning up and then factoring out cyclic 
subspaces. However, the novel aspect in this first phase is the introduction of ran- 
domisation. While not necessary for the computation of the characteristic polyno- 
mial it underpins our proof of the Monte Carlo nature of our minimal polynomial 
algorithm. In addition to the Monte Carlo minimal polynomial algorithm we present 
and analyse in Section 8 a deterministic verification procedure to be run after Al- 
gorithm 5 that has a similar asymptotic complexity in many cases, but is O(n^) 
in the worst-case scenario. Our motivation for giving concrete upper bounds for 
the costs of various component procedures was that, in practical implementations, 
these assist us to compare different algorithms in order to decide which to use in 
different situations. At the end of the paper we discuss a practical implementation 
and tests of the algorithms in the GAP system [4]. 

1.1. Other algorithms in the light of our requirements 

There are several interesting and asymptotically efficient minimal polynomial 
algorithms for n x n matrices in the literature. The most asymptotically efficient 
deterministic algorithm is due to Storjohann [14] in 2001. It is nearly optimal, 
'requiring about the same number of field operations as required for matrix multi- 
plication' (see [14, Abstract, p368]). It involves a dividc-and-conqucr strategy that 
produces matrices with entries in polynomial rings as intermediate results. Chang- 
ing the scalars to a larger field or polynomial ring is something we wished to avoid 
as it creates additional complications in practical applications within a computer 
algebra system used for group and matrix algebra computations. 

Storjohann's earlier deterministic algorithm [13] in 1998 uses classical field arith- 
metic and requires 0{n^) field operations. It first reduces the matrix to 'zig-zag 
form', using a mix of row and column operations, then produces the Smith normal 
form as a matrix with polynomial entries, and finally the Frobenius normal form. 
In systems such as GAP, matrices over small finite fields are stored in a compressed 
form that makes row operations simple, but column operations difficult. Restricting 
to one of these types of operations was one of our criteria. 

A Monte Carlo minimal polynomial algorithm of Gicsbrecht [5] from 1995 that 
runs in 'nearly optimal time' contains some features we find desirable for practical 
implementation, namely his algorithm first constructs a 'modular cyclic decomposi- 
tion' using random vectors, similar to our characteristic polynomial computation in 
Section 5. However, further steps include a modification of the 'divide-and-conquer' 
Keller-Gehrig algorithm [8] and lead to a Las Vegas algorithm that computes a 
Frobenius form over an extension field and then the minimal polynomial. The field 
size over which the given matrix is written is assumed to be greater than , and if 
this is not the case it is suggested that an embedding into a larger extension field 
be used. Several of these features were undesirable for us. 

In [1, Section 4] Augot and Camion propose a deterministic algorithm to compute 
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the minimal polynomial of a matrix which is to some extent similar to our algorithm. 
It is deterministic with complexity 0(n^ +to^ • n?) field oporatious, where m is the 
number of blocks in the shift Hessenberg form. They prove that the complexity is 
0{n^) in the average case. However, in the worst case it is O(n^), and no constants 
are provided in the complexity estimates. Although the principal approach of their 
algorithm is similar to ours, the details differ very much from our algorithm and 
analysis. 

An interesting commentary on various algorithms, together with some new al- 
gorithms is given by Eberly [3]. Eberly (see Theorem 4.2 in [3]) gives in particular 
a randomised algorithm for matrices over small fields that produces output from 
which (amongst other things) the minimal polynomial can be computed, at a cost 
of O(n^). The papers [3, 5, 12, 13, 14] contain references to other minimal poly- 
nomial algorithms. In all of the algorithms mentioned the asymptotic complexity 
statements give no information about the constants involved. 

On a practical note, the minimal polynomial algorithm implemented in the GAP 
library is the one in [12] and (although we have been unable to confirm this) we 
assume that this is the algorithm implemented in Magma [2] . 

1.2. Outline of the paper 

In Section 2 we introduce our notation, in Section 3 we cite a few complexity 
bounds for basic algorithms. The next Section 4 introduces order polynomials and 
derives a few results about them. Then we turn to the computation of the character- 
istic polynomial in Section 5, since this is the first step in our minimal polynomial 
algorithm, which is described and analysed in Section 7. We explain and modify the 
well-known algorithm to compute characteristic polynomials by introducing some 
randomisation, because this is later needed in the analysis of our main Monte Carlo 
algorithm. In Section 6 we give some probability estimates that are also used later 
in the analysis. The second last Section 8 covers the deterministic verification of 
the results of our Monte Carlo algorithm. We describe in detail cases in which this 
verification is efficient and when it has a worse complexity. Finally, in Section 9 
we report on the performance of an implementation of our algorithm, including 
runtimes in realistic applications. We compare these times with the current imple- 
mentation for minimal polynomial computations in the GAP library (see [4]), and 
as mentioned above, we believe that Magma and GAP are both using the algorithm 
in [12]. Wc show that our algorithm performs much better in important cases, and 
that our bounds on the computing cost are reflected in practical experiments. 

2. Notation 

Throughout the paper F will be a fixed field. Although we envisage F to be a 

finite field for our applications, this is not necessary for most of our results. However 
in the later sections we use some probability estimates from Section 6 that are only 
valid for finite fields. 

By an elem,enta,ry fi,eld operation we mean addition, subtraction, multiplication 
or division of two field elements. In all our runtime bounds we will assume that one 
elementary field operation takes a fixed amount of time and we simply count the 
number of such operations occurring in our algorithms. 

We denote the set of (m x n)-matrices over F by F™^" and the set of row vectors 
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of length m by F"*. For a vector v G ¥™ we write Vi for its i-th component and for 
a matrix M G F™^" we denote its i-th row, which is a row vector of length n, by 
M[i]. We use "row vector times matrix" operations, and in general right modules 
throughout. If F is a vector space over F and W is a subspace, the quotient space 
is denoted by V/W and its cosets hy v + W for v G V. The F- linear span of the 
vectors v'-^^ , . . . jV^''^ £ V is denoted by (^v^^\ . . . , v'^'^'')^. 

If M G F"^" is a matrix and F = F", we have a natural action of M as an 
endomorphism of V by right multiplication. The same holds for every M-invariant 
subspace W < V and for the corresponding quotient space V/W. We describe such 
a situation by saying that "the matrix M induces an action on the F- vector space" 
V, W, V/W respectively. 

Throughout. ¥[x] denotes the polynomial ring over F in an indeterminate x. For 
a square matrix M and a polynomial p G F[a;] we denote the evaluation of p at M 
byp(M). 

Whenever a matrix M induces an action on a vector space U, we will view [/ as a 
right F[a;]-module by letting x act like M, that is v-x := v-M in the above examples. 
We denote the characteristic polynomial of this action by Xm,u- That is, Xm,u is 
the characteristic polynomial of the {dimr{U) x dimF(t/))-matrix given by choosing 
a basis of U and writing the action of M induced on [/ as a matrix with respect to 
that basis. We use the same convention analogously for the corresponding minimal 
polynomial fiM.u- Furthermore, we denote the F[a;]-submodule of U generated by 
the vectors u^^\ . . . , m^") by {u^^\ . . . , m^"^)^^^. 

We use the two functions 

a a 

s^^\a,b):= I and s^'^\a,h) := ^ ^ (1) 

1=5+1 i=b+l 

for complexity expressions. Note that for a > 6 > c we have s^^\a^c) = s^^\a,b) + 
sW(6,c) for j e 1,2 and 

sW(n,0) = sW(",-l) = ^^^^^ and (2) 

.(2)(n,0) = a(2)(n,-l) = "(" + ^^+1) . (3) 

6 

For later complexity estimates we note the following inequalities. 

Lemma 2.1 (Some upper bounds). If n = J2i=id'i /^'^ some di G N \ {0} and 
Sj :— J2i=i have 

A nin+l) , ^ , , n(n+l)(n + 2) 

Proof: We claim that for fixed n both expressions arc maximal if and only if all di 
are equal to one. We leave it to the reader to check that both totals increase if we 
replace dj in some sequence {di)i<^i^k by the two numbers a and dj — a resulting 
in the new sequence {d[,. . . , c?fc_|_i) := (^1,^2, . . . , dj-i,a, dj — a, dj+i, . . . , dk) of 
length fc + 1. ■ 
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3. Complexity bounds for basic algorithms 

In some algorithms presented in later sections we use greatest common divisors 
of univariate polynomials. To analyse these algorithms we use the following bounds 
which arise from standard polynomial computation. We take this approach because 
the standard algorithms for polynomials are good enough for our complexity es- 
timates in applications and we do not need the asymptotically best algorithms, 
discussion of which may be found conveniently in [15]. 

Proposition 3.1 (Complexity of standard greatest common divisor algorithm). 

Let f,gG ¥[x] with n := deg/ ^ degg =: m, and f = qg + r with q,r £ ¥[x] 
such that r = or degr < degg'. Then there is an algorithm to compute q and r 
that needs less than 2(m + l)(n — m + 1) elementary field operations. 

Furthermore, there is an algorithm to compute gcd{f,g) that needs less than 
2(rn + l)(n + 1) elementary field operations. 

Remark 3.2. We intentionally give bounds here which are not best possible, since 
we want the bound for the gcd computation to be symmetrical in m and n. 

Proof of Proposition 3.1: Use polynomial division and the standard Gcd algorithm 
and count. See [15, Section 2.4 and Section 3.3] for smaller bounds that imply our 
symmetric bounds. ■ 

3.1. Polynomial factorisation 

Some of our algorithms return partially factorised polynomials which facilitate 
later factorisation into irreducible factors. However, since the extent of this par- 
tial factorisation is difficult to estimate, we use in our analyses the complexity of 
finding the complete factorisation of a polynomial over a finite field as a product 
of irreducibles. We need such factorisations in our main algorithm. In keeping with 
our other methods we make use of standard polynomial factorisation procedures. 

Details can be found in Knuth [9, 4.6.2] of a deterministic polynomial factorisa- 
tion algorithm inspired by an idea of Berlekamp. Its cost is polynomial in both the 
degree n and field size |F| = as it requires 0{q) computations of greatest common 
divisors. Thus it works well only for q small. There is available a randomised (Las 
Vegas) version of the procedure which (for arbitrary q) will always return accurately 
the number r of irreducible factors of f{x) G F[a;], but for which there is a small 
non-zero probability that it will fail to find all the irreducible factors. It involves 
the procedure Random Vector, which is discussed further in Subsection 5.1, to 
produce independent uniformly distributed random elements of an n-dimcnsional 
vector space over F for which a basis is known. Throughout the paper logarithms 
are always taken to base 2. 

Remark 3.3 (PolynomialFactorisation). Suppose f{x) e ¥[x\ of degree n > 1 
with r irreducible factors (counting multiplicities) and, if q is large, suppose that 
we are given a real number e such that < e < 1/2. The number fact(n, g) of 
elementary field operations required to find a complete set of irreducible factors of 
f{x) is at most 

8n^ + {iqr + 171ogg)n^ deterministic algorithm 

0((log £~^)(log n)(^„ log^ q) + log^ g) Las Vegas algorithm 
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where ^„ is an upper bound for the cost of one run of RandomVector on F". 
The Las Vegas algorithm may fail, but with probability less than e. 



Let M be a matrix in F"^" that induces an action on an F- vector space V. 
We briefly recall the definition of the term "order polynomial" : 

Definition 4.1 (Order polynomial ordM(w) and relative order polynomial). The 
order polynomial ordM(w) of a vector v (z V is the monic polynomial p G F[a;] of 
smallest degree such that v ■ p{M) = G In particular ordM(O) = 1. 

For an M- invariant subspace W <V, the relative order polynomial oidMiv + VV) 
(of V relative to W) is the order polynomial of the element v + W G V/W with 
respect to the induced action of M on V/W. 

Remark 4.2. If we consider V as an F[a;]-modulc as in Section 2, then p is the 
monic generator of the annihilator ann^[x](f) of v in F[a;]. 

The following observation follows immediately from the definition above. 

Lemma 4.3 (Relative order polynomials). For an M -invariant subspace W < V 
and V gV, ordM(w + W) is the monic polynomial p G ¥[x] of smallest degree such 
that v-p{M) e W. 

We now turn to the question of how one computes the order polynomial of a 
vector V dz V . The basic idea is to apply the matrix M to the vector repeatedly 
computing a sequence v, vM, vM"^ , . . . , vM'^ until vM"^ is a linear combination 



i=0 

with Oj € F, for < i < d. If d is minimal such that this is possible, we have 

d-l 



Although this procedure is simple and well-known, we present it in order to make 
explicit the number of elementary field operations needed. To this end we describe 
in detail the computation of solutions for the systems of linear equations involved. 

Definition 4.4 (Row semi echelon form). A non-zero matrix S = {Si_j) € F™^" 
is in row semi echelon form if there are positive integers r ^ m and ji, ■ ■ ■ , jr ^ ri 
such that, for each i ^ r, Sij^ = 1 and Skj^ = for all k > i, and also Skj = 
whenever k > r. For i ^ r, column ji is called the leading column of row i, and 
we write lc(i) = ji- A sequence of vectors u'^^\ . . . , u^™^ G is said to be in semi 
echelon form if the matrix with rows u^^\ . . . , w^"*^ is in row semi echelon form. 

Note that in Definition 4.4 we do not assume ji < j2 < ■ ■ ■ < jr which is the usual 
condition for an echelon form. 

Definition 4.5 (Semi echelon data sequence). Let Y G f^x" be a matrix with 
m ^ n and of rank m. A semi echelon data sequence for F is a tuple y = {Y, S, T, I), 



4. Order polynomials 



d-l 
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where S G F"^^" is in row semi echelon form with leading column indices I = 
(lc(l), . . . , lc(m)), and T € GL(m, F) with TY = S. Further, T is a lower triangular 
matrix, that is, for T = {Tij) we have Tij = for i < j. For a semi echelon data se- 
quence y we call the number m its length, sometimes denoted length(y). A semi ech- 
elon data sequence y' = {¥', S', T', I') is said to extend y if lcngth(>") > length(>'), 
the first lcngth(3^) rows of Y' and S' form the matrices Y and S respectively, and 
the first lcngth(3^) entries of I' form the sequence I. 

Remark 4.6. (a) The idea of this concept is that for a matrix S G F™^" in row 
semi echelon form it is relatively cheap to decide whether a given vector f e F" 
lies in the row space of S, and if so, to write it as a linear combination of the rows 
of S, that is, to find a vector a G F™ such that v — aS — aTY (see Algorithm 1). 
Thus, the vector v is expressed as a linear combination of the rows of Y using the 
vector aT as coefficients. 

(b) We call a semi echelon data sequence trivial if m = 0. In this case, by 
convention, we take the row spaces of the empty matrices Y and S to be the zero 
subspace of F", we denote the empty sequence in F° by 0, and we interpret aS as 
the zero vector of F". 



We now present Algorithm 1, which is one step in the computation of a semi 
echelon data sequence for a matrix Y. We denote by S[i] the i-th row of the matrix 
S, and by RowSp(5) the row space of S. 

Algorithm 1 CleanAndExtend 

Input: A semi echelon data sequence y = (F, S, T, I) with Y,S G F™^", u e F" 
(possibly TO = 0). 

Output: A triple {c,y', a') where c is True if t> G RowSp(y) and False other- 
wise, y equals or extends y respectively with length(y) < length(y) -|- 1, and 
a' e rength(D;'), such that v = a'S'. 



w := V 

a := e F™ {note that w = v- aS} 
for i = 1 to TO do 

tti := wi^ 

w := w — tti • S[i] 
end for {still w = v — aS} 
if w = then 

return (True, (F, S, T, I), a) 
else 

j := index of first non-zero entry in w 



a := \a 
Y' : 



Y 

V 



I' [I j] 
S' := 



S 
.-1 



w 



return (False, (y' , 5', T', I'), a') 
end if 



-wj'-aT wf 



Proposition 4.7 (Correctness and complexity of Alg. 1: CleanAndExtend). 
The output of Algorithm 1 satisfies the Output specifications. Moreover, Algorithm 1 
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requires at most 2mn field operations ifvG RowSp(y), and (2m+ l)n+ (m+ 1)^ + 1 
field operations otherwise. 

Remark 4.8. (a) Given a semi echelon data sequence {Y, S, T, I) with Y,Sg F"*^" 
and a vector v G F", Algorithm 1 tries to write v as a linear combination of the rows 
of S. If this is not possible, it constructs an extended semi echelon data sequence. 

(b) For the case of finite fields a simple and useful optimisation is to reduce, where 
possible, the number of operations for vectors and matrices, for example, where a 
vector is multiplied by the zero scalar and the result is added to some other vector. 
This can reduce the number of operations for sparse vectors and matrices. Our 
estimates for the numbers of field operations then become over-estimates. 

Proof of Proposition ^.1: The proof of the correctness of Algorithm 1 is left to the 
reader. The for loop needs 2mn field operations if we count both multiplications 
and additions. If w G RowSp(y) then the algorithm terminates after this loop. 
On the other hand, if w ^ RowSp(y), then Algorithm 1 needs one inversion of 
the scalar Wj plus 2 • * — m(m + 1) field operations for the vector times 

matrix multiplication aT, because T is a lower triangular matrix. This is altogether 
m(m+ 1) + 1 operations. Finally, the scalar negation of and the multiplication 
of aT by —w^^ needs another m + 1 field operations, and a further n operations 
are needed for the computation of wj^w in S'. Thus the total number of field 
operations is at most 2mn + {m. + 1)^ + 1 + n. ■ 

Having Algorithm 1 at hand we can now present Algorithm 2, which computes 
relative order polynomials. Since a (non-relative) order polynomial may be regarded 
as a relative order polynomial with respect to the zero subspace. Algorithm 2 can 
also be used to compute order polynomials, starting with the trivial semi echelon 
data sequence, (see Remark 4.6 (b)). 

Proposition 4.9 (Correctness and complexity of Alg. 2: RelativeOrdPoly). 

Let y = {Y,S,T,l) be a semi echelon data sequence with Y,Sg f™^" (possibly 
m ^ 0), V e F", and M G F"^" such that W := RowSp(y) is M-invanant. The 
output of Algorithm 2 satisfies the Output specifications, and moreover ifd > 0, then 
rows m + 1, . . . ,m + d ofY' are equal to v, vM, . . . , vM'''~^ respectively. Algorithm 2 
requires at most 

2dn'^ + (n + 2)d + 2{m + d)n + 2{n + l)s^'^\m + - 1, m - 1) + 
+ s(2) (jrij^d-l,m-l) + 2s(i) (m + d, 0) 

elementary field operations where s^^^ and s^^^ are the functions defined in (1). 

Remark 4.10. Note that, if = then S' = S , so v = b'Y e W, and in this case 
p = 1. Algorithm 2 successively considers the vectors v + W, vM + W, . . . , vM'^ + W 
(those are the successive values of v') until ■uM'* + W lies in the subspace of V/W 
generated by the vectors v + W, vM + W. . . . , vM"^^^ + W. The given matrix S 
together with Algorithm 1 defines a direct sum decomposition of the F-vector space 
V — W^ = W ® W where W is the subspace of vectors having in all positions 
occurring in the list I. Since W' = V/W , Algorithm 2 eflfectively computes in V/W 
by always 'cleaning out' vectors using S first. 
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Algorithm 2 RelativeOrdPoly 

Input: A semi echelon data sequence y = {Y, S, T, I) with Y, S G F™^" (possibly 
m = 0),v€ F", and M e F"^" such that W := RowSp(F) is M-invariant. 
Output: A triple {p,y',b) consisting of the relative order polynomial p := 
ordM{v + W) of degree d, a semi echelon data sequence y' = {Y' , S' ,T' ,1') 
of length m + d equal to or extending y, and a vector b e F'"+'^ such that 
vM"^ = bY'. 

{Y', S', T', I') := (Y, S, T, I) {the primed variables change during the algorithm} 
v' := V 

m' := m {can be zero!} 

loop 

(c, (F', S', T', V), a) := CleanAndExtend((F', S', T', I'), v') 

{T'Y' = S', v' = aS'} 

if c = True then 

leave loop 
end if 

v' ■.= v'-M 
m' :=m' + 1 

end loop {at this stage c = True, v' = aS', m' = length(3^')} 

d := m' — m 

b:=a-T' 

p:=x'^- J2tZo bm+i+ix' 
return {p, {Y' , S' ,T' ,l'),b) 



Proof of Proposition 4-9: We again leave the proof of correctness of Algorithm 2 to 
the reader. Algorithm 2 calls Algorithm 1 (CleanAndExtend) exactly d+1 times 
with the lengths of the input semi echelon data sequences being to, m + 1, . . . , m + d. 
After each but the last call to Algorithm 1 the value of c returned is False, and 
after the last call the value of c is True. Thus, by Proposition 4.7, the number of 
steps needed for the d+1 runs of Algorithm 1 is at most 

(m+d-l \ 
{{2i + l)n + (z + 1)2 + 1) + 2(m + d)n 
i=m / 

= s^'^\m + d-l,m-l) + (2n + 2)sW(m + d- l,m- 1) + {n + 2)d + 2{m + d)n. 

In addition, we have to do d multiplications of v' with M, which require 2n^ elemen- 
tary field operations each, and finally the computation of b requires 28^^"^ (to + d, 0) 
elementary field operations, again since T' is a lower triangular matrix. Summing 
up gives the expression in the statement. ■ 

We conclude this section with two lemmas that are used to compute absolute 
order polynomials using relative ones. We again view V as an F[a;]-module by let- 
ting X act like M. For {■y^^), . . . , u^™)} C V, we denote by {v^^\ . . . ,v^"'^) ^ the 
submodule of V generated by {v^^\ . . . ,v^"^^}, that is, the smallest M-invariant 
subspace containing {v^^\ . . . ,v^"^^}. If to = 1 then is the F-span of the set 

{^(1)^ ^(1) jVf, . . . ,z;(i)M"~^}. We call {v^^^)^^ a cyclic subspace relative to M. 
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Lemma 4.11 (Order polynomials in cyclic subspaces). Let v gV, W = {v)j^ <V, 
and p := oTdM{v) with d := deg(p). Then for each w G W, there is a unique 
polynomial q £ ¥[x] of degree less than d such that w = vq{M). Moreover, 

ordM(w) = — — -. 

gcd(p, q) 

We omit the routine proof for the sake of brevity. ■ 

Lemma 4.12 (Absolute and relative order polynomials). Let W be an M -invariant 
suhspace ofV,v€V and q := ordAf (w + W) G ¥[x]. Then 

ordM{v) = q ■ ordM{vq{M)). 
We omit the routine proof for the sake of brevity. ■ 

5. Computing the characteristic polynomial 

In this section wc present a version of a standard algorithm for computing the 
characteristic polynomial of a matrix together with its analysis. It differs from the 
standard version in its use of randomisation. 

5.1. Random vectors 

Our characteristic polynomial algorithm, and later ones, make use of the algo- 
rithms RandomVector and RandomVector* that produce independent uni- 
formly distributed random vectors, and independent uniformly distributed random 
non-zero vectors, respectively, in a given finite vector space for which a basis is 
known. The algorithms are invoked for spaces F*, for s e N, and for subspaces of 
V of the form 

V{1) = {v\vi. = for l^i^m} where I = {li, . . . J^n)- 

If I is the empty sequence then V{1) = V. For a semi echelon data sequence y = 
(y, S, T, I), the vector space V is the sum V = V{1)® RowSp(5'). 

If 6 = RandomVector(F'''"s*'^(^)), then bS is a uniformly distributed random 
vector of RowSp(5'). Moreover we assume that for the disjoint spaces F'''"sth(3^) g^j^^j 
V{1) the algorithms RandomVector and RandomVector* are applied indepen- 
dently so that in particular, if a = RandomVector* (F(Z)) then the sum a + bS 
is a uniformly distributed random vector of \ RowSp(S'). 

RandomVector and, if we neglect the possibility of obtaining the zero vec- 
tor, also RandomVector*, could proceed by selecting independent uniformly dis- 
tributed random field elements as coefficients of the basis vectors. For the subspace 
V{1), we could put zeros into the entries occurring in / and make random selec- 
tions of elements from F for each entry not in I. For this reason we denote by 

an upper bomid for the cost of RandomVector or RandomVector* ap- 
plied to an r-dimensional space for one of these cases. If r < s then < and 
+ ^ Cri+r2J Eiud WC would cxpect to Vary linearly with r. In practical 
implementations the cost is much less than the cost of the field operations involved 
in the algorithm below. 
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5.2. Characteristic polynomial algorithm 

The characteristic polynomial algorithm below would terminate successfully with- 
out making random selections of vectors. However, the use of randomisation is key 
to our application of this algorithm for finding minimal polynomials. As in previous 
sections, let M be a matrix in F"^" acting naturally on V := F". 

Algorithm 3 CharPoly 
Input: M e F"^" 

Output: A tuple (fc, 3^, where each p'-j^ e F[a;] and 

^t=l^'^*^ = Xm,v is the characteristic polynomial of M in its action on V, each 
g ]F" and 3^ is a semi echelon data sequence of length n with the properties 
specified in Proposition 5.1. 

i := 

37(0) := a trivial semi echelon data sequence 
while lengthO'^') ) < n do 
i:=i + l 

a := RANDOMVECTOR(F'''"eth(j;('-i')) 

C := RANDOMVECTOR*(y(Z('~-^))) 

t;« := aS'(*-i) +c 

{ v« ^ RowSp(5(*-i)) where y^''^^ = {Y'^'-^\ S'^'-^\T'^'-^\l<^'-^y)} 
(pW,3;(i),i,W) := RelativeOrdPoly(3;('-i),i;W,M) 

{5(0 g ]piength(3^(')). ^ _ iength(>'(')) zeros to make b^'^ e F"} 

end while 
k := i 

return {k, (6«')i^j^fc) 



Proposition 5.1 (Correctness and complexity of Algorithm 3). Algorithm 3 sat- 
isfies the Output specifications, and furthermore y = (Y, S, T, I) where F g F"^" is 
invertible with rows 

4'\4'^M, . . . , 4'^M''^-\4^\4^^M, . . . , 4^)M^^-\ . . . , t/^), #)M, . . . , #)m''^-i 

where di :— deg{p^^'>) for 1 ^ i ^ k. Further, for 1 ^ i ^ k, is a uniformly 
distributed random element of V \ Wi^i, w^'^M''' = b'^^^Y and p^"^^ — oidM{v^^^ + 
Wi-i), where Wi-i := (w^^-*, . . . , w*^*^^-')^^^ (for i > 1), an M-invariant subspace 

of V of dimension Sj-i := d'j' '^'^^ ^0 = of dimension sq = 0. Moreover, 

Algorithm 3 requires at most 

33 Q , n 3 

—n'^ + 4n^ + -n 
o ^ 

elementary field operations, plus k^n for the k calls to RandomVector* and Ran- 
domVector. Neglecting the latter cost this is less than 6n^ elementary field oper- 
ations, for sufficiently large n. 

Remark 5.2. We denote the semi echelon data sequences 3^*^*^ in the algorithm 
using indices to enable us to speak more easily about the intermediate results. 
However in practice we have only one variable y = {Y, S, T, I), the entries of which 
are growing during the execution of the algorithm. 
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Remark 5.3. Note that we do not multiply together the factors of xm,v because 
in our application of Algorithm 3 we do not need the product itself. 

Proof of Proposition 5.1: Most statements in the proposition follow immediately 
from Proposition 4.9: note that, because of the conventions explained in Remark 
4.6(b), in the first run of the 'while' loop v'^^'' = c is a random non-zero vector of 
V and = ord)M(t;(i)), and more generally, in the i run of the 'while' loop. 
Algorithm 3 chooses a vector w*^*^ that is a uniformly distributed random clement 
of y \ RowSp(5'(*~^^) and applies Algorithm 2. This immediately establishes all 
statements about (F, 5, T, I) including the one about the invertibility and the rows 
of Y. Also it is clear that = ordM(i/'^ + Wi-i). 

Next we show that ni=iP''*^ = Xm,v- This follows by considering the matrix 
YMY~^, which has the same characteristic polynomial as M. Considering the 
action of A/ with respect to the ordered basis of F" given by the rows of F, it 
follows from the construction that YMY~^ (written with respect to the standard 
basis) is equal to 

" Ci ••• " 

: ■•• ■•. 

. ••• Ck . 

where the matrix Cj is the companion matrix of the polynomial p^'^ , and the sj'^ , 
for 2 ^ i ^ k and \ ^ j ^ i — 1, are matrices in F'^'^'^^ with one non-zero row 
at the bottom and all other rows zero. If = {hf, ...,b'n^), then the bottom 
row of Bj^^ is ib'"s._^+i, ■ ■ ■ ,b^8j)- With this format at hand it is clear that the 

characteristic polynomial oiYMY~^ is equal to the product HiLiJ^^*^ because the 
Ci are companion matrices. 

Finally wc derive the statement about the number of elementary field operations 
needed by Algorithm 3. In the z"' run of the while loop, the cost of constructing 
the random vectors a and c is at most 

Cn-length(J'(*-i)) + 6ength(>'(*-i) ) ^ ^n, 

(see Subsection 5.1). The cost to compute v^^^ is at most 2sj_in elementary field 
operations, where sq = 0, and for i ^ 1, Si = J2]=i dj with dj = degp^^^. The cost 
of applying Algorithm RelativeOrdPoly is, by Proposition 4.9, at most 

2d^n2-H(n-H2)(i,-^2s,n-h2(n-M)s(i)(s»-l,Si_i-l)+s(2)(s^-l,Si_i-l)-F2s(i)(si,0) 

elementary field operations, noting that the value of 'd' is di, the value of 'm' is 
Si_i, Si-\ + di = Si, and s^^\ s^^^ are the functions defined in (1). 

We consider the different terms one by one, summing each over i from 1 to A;. 
The total cost of constructing the random vectors is at most fc^„. Summing the 
terms 2si-in gives 2n^^^j^Si_i, and summing the terms 2din^ gives 2n^ since 
Yli=i '^i — '^^ Similarly, summing the terms (n -|- 2)di gives (n -|- 2)n. From the 
terms 2sin we get a contribution of 2n^^^^ Si. The next two expressions involving 
the functions 5*^^^ and 5*^^^ sum to 2{n + l)s(^)(n — 1,0) = n{n + l)(n — 1) and 
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s^'^\n— 1, 0) = (" ^)n(2n 1) j-espectively, using (2) and (3) and the properties noted 
above it. Finally, the terms 2s(i)(si, 0) sum to 2 s^'^^Si, 0) = + !)• 

Thus in total we obtain k^n plus 

^ -i , .s, ,N (n— l)n(2n— 1) , „ 
2r? + n(n + l)(n - 1) + '-^ + n(n + 2) + 2n Si-i 



6 



+ 2n^Sj + ^Si(si + 1) 



j=l i=l 



elementary field operations. The first four of these terms sum to + \r? + \n. 
Using Lemma 2.1, 

^ „ / ,N „ 9/ .X n(n + l)(n + 2) 
2n Si_i + 2n Si + 2^ Si (si + 1) < 2n2 (n + 1) + ^ 

i—l i=l i=l 

so the total cost is at most 
For sufBciently large n this is less than 6n^ + 



6. Probability estimates using the structure theory for modules 

The basic idea of our minimal polynomial Algorithm 5 is to compute the order 
polynomials of a few random vectors under the action of a given matrix M and 
to prove that, with high probability, their least common multiple is equal to the 
minimal polynomial of M. The purpose of this section is to use the structure theory 
of y = F" as an F[a;]-module to derive probability estimates to be used in that proof. 

First suppose that the characteristic polynomial of M is written as a product 
Xm.v = n*=i 1i' with pairwise distinct irreducible polynomials Qi G ¥[x\ and posi- 
tive integer multiplicities Cj. 

Using [7, Theorem 3.12] we can then write the F [a;] -module F as a direct sum 
of primary cyclic modules 

t mi 

V = ^^Wij¥[x] (4) 

such that ordM(wi,j) = with a > > fi,2 > ■ ■ ■ > fi,mi > 1 and YJjli fi,j = 
ej for 1 ^ i ^ t. 

The minimal polynomial ij,m,v is the least common multiple of the order poly- 
nomials of the vectors (wij)i^i^t,i^j^m,j, and hence is ^m,v = Y[l=iili)^''^ ■ 

We use this structural description to derive the first probability bound for the 
case where F = Fg is a finite field with q elements. 

Proposition 6.1 (Probability that a has equal mult, in /Um.v and ordM(t'))- 
Let F = Fg be a finite field with q elements, let V = F", let U be a (possibly zero) 

M -invariant subspace such that the multiplicity of Qi in fJ.M,u is strictly smaller 
than in /Um.v, and let v be a uniformly distributed random element ofV\U. Then 
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the multiplicity of qi is the same in ordM(t^) and hm,v with probability greater than 
I — q~ deggi _ 

Proof: By assumption the multiplicity of qi in fiM.u is kss that its multiplicity 
/ '■= fi,i in IJ'M,v- Let w := Wi^i, with Wis as in (4), so that V = X (BY with X, Y 
invariant under M and X = {w) j^. Then /xm,x = <?/• We may identify the primary 
cyclic F [a;] -module X with wF[a;], which is isomorphic to the module F[a;]/ (g/ F[a;]), 
and in turn this is uniserial with composition series 

P ^ < «il!!M < . . . < ^ffl < ^[^] 



Thus, X has a unique maximal F[a;]-submodule, namely X' := {wqi{M)) j^, and X' 
has codimension r := deg(g'j) in X. 

As discussed above, each vector v has a unique expression as u = a; + y with 
X £ X, 2/ e F. Moreover ovAm{v) is the least common multiple of ordM(a;) and 
ordM(y)- In particular, if x ^ X', then ordM(a;) = q{ and hence the multiplicity of 
qi in ordM(^^) and ^im,v is the same. The number of vectors v = x + y with x ^ X' 
is 

|x\x'|.|r| = (i-l)|x|.|y| = (i-l)g". 

Each of these vectors v lies mV\U since the multiplicity of qi in jiM.u is less than 
/. Thus the probability, for a uniformly distributed random v ^ V \ U , that the 
multiplicity of g, in ordM(^^) and fJbM,v is the same is at least 

1 1 

^'^~'^\v\u\ 



Remark 6.2. If for some irreducible factor qi we have Wi > 1 and = /i,2, then 
the above probability is even higher, because we can apply the above argument 
independently to two or more summands u;i,iF[a;] and t(;i,2F[a;]. 

We now give a second probability bound which will be crucial in our Monte 
Carlo algorithm to compute the minimal polynomial. In that algorithm we choose 
a sequence of vectors v^^\ . . . v^^^ such that v^^^ is a uniformly distributed random 
element of ^ \ {0}, and for i ^ 2 we choose v^'^^ as a uniformly distributed random 
element ofV\U, where U = {v^^\ . . . , We hope to find iim,v as the least 

common multiple of the orders of these vectors. 

Proposition 6.3 (Probability that an 1cm of order polynomials equals piM,v)- 

Let V = ¥q be a finite field with q elements. Suppose a sequence of vectors 
v^^\ . . . G V is chosen as follows: v^^^ is a uniformly distributed random ele- 
ment ofV\ {0}, and for i > 1, v^^^ is a uniformly distributed random element of 
F\<t;W,...,t;(^-i))^. Let 

f := lcm(ordM(^;(')),ordMO/''), • • . , ordM(i^("))). 
Then the probability that f = ^m,v is greater than 



I deg qi 

i=l 
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Proof: Consider the random experiment described in the statement. We first ex- 
amine one irreducible factor g,. Let Ei denote the event that the multiphcity of qt 
in / is strictly smaller than the multiplicity fi.i of qi in fjLM,v- Furthermore, for 
1 ^ i ^ M, let Fj be the event that the multiplicity of qi in ordM(w^^^) is strictly 
smaller than fi^i. 

Note that the Fj are not stochastically independent since we choose v'^^^ outside 
of (t)^^' , • • • , v^^~^^ However, = Fin-F2ri- • ■f^Fu because / is the least common 
multiple of the order polynomials of the v'^^\ By Proposition 6.1 applied with 
U = {0}, the probability Prob(i^i) is less than g~'^''g'?'. Moreover, in the situation 
that Fi n • • • n holds and j < u, we apply Proposition 6.1 with the subspace 
} conclude that the conditional probability Prob(Fj_|_i|Fi n 
■ ■■ (iFj) is less than qi-<i«sg« Thus we have 

Prob(£;i) = Prob(Fi) • Prob(F2|Fi) • Prob(F3|Fi n F2) • • • • 

• • • • Prob(F„|Fi n • • • n F„_i) < g-"degg._ 

Finally we consider all the different irreducible factors qi. 

Even though the events Ei,.. .,Et may not be stochastically independent, we 
have 

i i 

Prob(^i L\---L\Et)^J2 Prob(-Ei) < ^ g-«deg9i 

i=l j=l 

as claimed. ■ 



7. Computing minimal polynomials 

Our minimal polynomial algorithm runs Algorithm 3 as its first step. So as- 
sume, from now on, that we have already run Algorithm 3 and obtained all the 
output it produces, in particular the basis given by the rows of the matrix Y (as in 
Proposition 5.1), 

i;(i)M, . . . , t;(i)M*-i, . . . , w^^^M, . . . , v^''^ M'^--^) 

the relative order polynomials p^*^ = ordM(^'*-*-' + Wi-i), and the vectors b^^^ for 
1 ^ i ^ k. Also assume that we have factorised all the polynomials p^^^ as products 
P*'*'* =^ rij-^i 9^'^ of irreducible polynomials {qj)i^j^t- 

The matrices M and YMY^^ have the same characteristic and minimal poly- 
nomials. Also the order polynomials ordM(^^) and OTdYMY-i{vY~^) are equal for 
every v £V and thus also the order polynomials ordM(vF) and OTdyMY-^iv) are 
equal for every v € V. 

For the convenience of the reader we display the matrix YMY~^ in Figure 1. 
Note in particular that the matrix is sparse, provided that the degrees di arc not 
too small. Due to the special form of YMY~^ it is much more efficient to compute 
the images of vectors under YMY~^, than under M. This is crucial in the analysis 
of our algorithms. Therefore wc will from now on do all computations of order 
polynomials with respect to YMY~^. 

Set M' := FMF-i and W! := WiY''^ for 1 < z < fc. Note that we have v'^^ = 
for 1 ^ i ^ fc where e^^), . . . , e^") is the standard basis of F". (That is, e^'^ 
contains exactly one 1 in position i and otherwise zeros. Recall that Sj = J2]=i 
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Figure 1 : Overview of the matrix YMY ^ 
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with So = 0.) Furthermore, for 1 ^ i < fc, the space Wi = (w*-^-*, . . . , is equal 

^ _ for j > Si}. Thus, the space is the 
and we have a filtration 



' with V 



'F 



to the space {vY \ v e 
F-hnear span (e^^^e^^^ . 

O^Wo<Wl<W2<--- <wl, = v 

such that each quotient Wl/Wl_i is an M'-cychc space generated by the coset 
represented by the standard basis vector e*^*'"i+^\ 

We begin by presenting Algorithm 4 which computes the absolute order polyno- 
mial of a vector with respect to the matrix YMY^^, using all the data acquired 
during Algorithm 3. We will apply this later in the minimal polynomial algorithm 
to the first few of the vectors e^^*-^~^^^ produced during a run of Algorithm 3. Note 
that for the analysis it is crucial that a number z such that the vector v lies in 
is given as input to the algorithm. 

Proposition 7.1 (Correctness and complexity of Algorithm 4: OrdPoly). 

Let ¥ = ¥q be a field with q elements. The output of Algorithm 4 satisfies the 
Output specifications. Moreover, Algorithm 4 requires at most 




elem,entary field operations, where dj = degp^^^ , Sj = dj for j ^ 1 and sq = 0; 

and this is less than n^ for n sufficiently large. 
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Algorithm 4 OrdPoly 

Input: M, k, {Y,S,T,l), {p'^^^)i^j^k, (^^■'^)i^j^fe as returned by CharPoly, an 
integer z with 1 ^ z ^ k, v G W!,, and the factorisation p^^^ = n*=i Qp''~ fo^' aU 

Output: A list of factorised polynomials, the product of which is ordj-Mr-i (^^) 

i := z {will run down to 1} 

/ := [ ] {empty list} 

repeat 

if h^O then 

g := p^'V gcd(/i,p^*^) {factorised} 
add g to list / 

compute product g of factors in g 
if i > 1 then 

V := V ■ g{YMY~^) {see Proposition 7.1 for this computation} 

end if 
end if 
i := i — 1 
until i = 
return / 



Proof: Since we are computing an order polynomial with respect to the matrix 
M' = YMY'^^ we can always use the form of this matrix as displayed in Figure 1. 

The basic idea of Algorithm 4 is to use Lemmas 4.11 and 4.12 applied to the 
filtration 

o = w^<wi<w^<--- <wl^ = v. 

Starting with i := z and the original v lying in the space W!., the variable i 
runs downwards until 1. In each step, Algorithm 4 computes the relative order 
polynomial g := ordM'(v + for the then current value of ?; e W/. This 

assertion follows from Lemma 4.11 noting that, by our discussion above, p*-*-* = 
ordM('i^^*^ + Wi) = ordM-Ce^'*-'"^^^ + W(_-^). Next vg{M') is evaluated, which lies 
in W/_j by Lemma 4.3, and the induction can go on with i replaced by i — 1. The 
product of the polynomials in the list / returned is the product of all the relative 
order polynomials computed in the repeat loop, and this is equal to ordM'iv), by 
Lemma 4.12. 

To count the number of elementary field operations is a bit complicated here. 
Note first that by assumption we already know a factorisation of all the p*^'^ = 
Ylj=i ^ ^^^^ irreducible factors. Now gcd{h,p^^^) is equal to the product of the 
greatest common divisors gcd(/i, ), for j < t. Since the degrees of the polyno- 
mials g'J'"^ sum up to the degree di of p^'^, finding these gcd's, by Proposition 3.1, 
requires at most 

t 

2(deg(/i) + l)-^(degfe)ei,,+l) 
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field operations, which is at most 4d| since deg{qj)eij + 1 ^ 2 deg{qj)eij . Note 
that this is a rather crude estimate. At this stage we know all multiplicities of the 
Qj in gcd(/i,p^*^) and thus in g := p^*^ gcd(/i,p*^'^). Thus we have computed g in 
factor ised form, which is denoted by g in Algorithm 4. 

Now we discuss the number of operations needed to evaluate vg{M'). Due to the 
sparseness of M' , a multiplication of a vector w of W/ from the right by M' needs 
only a shift (which we neglect here) and an addition of a multiple of the non-zero 
part of b'^^^ for 1 ^ r ^ i requiring 2sr operations for each r. Thus computing 
wM' requires at most X]r=i elementary operations. Note that wM' lies in Vl^/ 
still. If f{x) G ^q[x\ of degree d, say f{x) = Ylt=o^rX^, then we can compute 
wf{M') = X;^=o Crw{M'y by first computing w{M'Y e WV for 1 < r ^ d at a cost 
of at most 2d^l^-^ Sr, next computing Crw{M'Y for ^ r < c? at a cost of at most 
{d + l)sj, and then adding these vectors at a further cost of at most dsi, making a 
total cost to compute wf{M') of at most {2d + l)si + 2rfX]r=i elementary field 
operations. 

The polynomial g{x) is available in factorised form, say g{x) = 11^=1 .fs{x) with 
deg/s = Tig, where Y^^=i "^s = deg(/ = di. From the previous paragraph we see 
that vg{M') can be computed at a cost of at most 

u / ^ \ ^ ^ 

I (2ms + l)si + 2ms Sr ) = (2oJi + u)si + 2di Sr < SdjSi + 2di s^. 

5=1 \ r=l / r=l r=l 

Subsequent runs of the repeat loop require similar numbers of elementary oper- 
ations, with i replaced by j where i — 1 ~^ j > 1. Thus Algorithm 4 needs at 
most 

^ (4d2 + 3d,s, +2d,-^s,J 

j=l \ r=l / 

elementary field operations, as claimed in the proposition. 

To find a simpler upper bound we look at the terms one by one. The last and 
most important term can be bounded by 

2/i\ Z I ^ \ ^ 

^^^'AJl'^n =2^Sr(s^-Sr-i) 

j=l \ r=l / r=l \ / r=l 

z z 
= 2Y^8r{s^- Sr) + 2Y^8rdr < (| + 2) • 

r— 1 r— 1 

since the function tisz — t) has maximum value s1/4 for t in the interval [0, s^]. 
The second term 3^^^-^ djSj is at most 3s1, and the term ^YTj=i is at most 

Altogether this amoimts to a bomid of (| + 9)s^ as claimed. Asymptotically, this 
is bounded above by n'' in the worst case as n ^ oo. ■ 
Now we present our main procedure. Algorithm 5. 

Proposition 7.2 (Correctness and complexity of Algorithm 5: MinPolyMC). 

Given a matrix M € F^^" and a number e with < e < 1/2, Algorithm 5 returns 
a tuple (6, /), where b is either True or Uncertain and f G ^q[x] is a polynomial. 
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Algorithm 5 MinPolyMC 

Input: M £ F^^", e with < £ < 1/2. 

Output: A tuple (6,/) where b is either True or Uncertain and / e Fg[x] 
(see Proposition 7.2 for details). 

((p^^^)Kj^fe, {Y,S,TJ), ■■= CharPoly(M) 

Factorise all p(^) = nt=i 9?'" 

Determine the least u e N such that Y.l=i < £ 

u := min{M, A:} 

/:= lcm(pW,...,pW) 

for I = 2 to ?i do 

/ := lcm(/, OrdPoly(M, k,{Y,S,T,l), {p^^) ) i^.-^^ , (&(^) ) KKfe - ^> e^^-^+D ) ) 
end for 

if li = fc or dcg f = n then 

return (True, /) 
else 

return (UNCERTAIN,/) 
end if 



With probability at least 1 — e the polynomial f = ijlm,¥^, and if b = True then 
f = l-i'M,¥^ is guaranteed. Moreover, if f 11m,¥^, then f is a proper divisor of 
fJ,M,¥" and every irreducible factor of iim,¥" divides f. 

The number of elementary field operations needed by Algorithm 5 is bounded 
above by 

u 

char(n, q) + fact(n, q) + ^(- + 9)s'^ 

inhere char(n, q) is an upper bound for the number of elementary field operations 
needed to compute the characteristic polynomial (see Proposition 5.1), fact(n, g) is 
an upper bound for the number of elementary field operations needed to factorise 
each of a set of polynomials over ¥q whose degrees sum to n (see Subsection 3.1). 
Moreover either u = k, or u < k and ^ £_ 

For n sufficiently large and fixed e, this is less than 

6n^ + fact(n,g) + ir^^^^^^f-^ 
3 logg 

which is less than 7n^ + fact(n, q) (plus the cost of computing at most n random 

vectors in Algorithm 3). 

Remark 7.3. Note that if we use a randomised polynomial factorisation algorithm 
(necessary for large q), then the algorithm can be modified to allow for a possible 
failure of factorisation of the 'Factorise' step (line 2). Thus Theorem 1.1 follows 
from Proposition 5. An upper bound for the term i&ci{n,q) in the complexity 
bound is given in Remark 3.3, and this yields an upper bound in Proposition 7.2 
of 0{n^ log^ q) for n sufficiently large and fixed e. 

Remark 7.4. Algorithm 5 can be changed into a deterministic one by running the 
'i-loop' for i up to fc, instead of u. An upper bound of the cost is then given by 
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replacing u by fc in the formula for the cost in Proposition 7.2. 

If /c > M, then these additional k — u runs of the 'i-loop' may be viewed as a 
'verification algorithm'. By Proposition 5, the additional cost of these extra runs is 

i=u+l ^ ^ 

and for sufficiently large n this cost is less than field operations. 

Proof of Proposition 7.2: Algorithm 5 first computes the characteristic polynomial 
of M in its action on F" and its factorisation. This computation provides firstly 
the irreducible factors qj of the minimal polynomial that allow us to determine 
M, and secondly the input needed for running Algorithm 4 to compute the order 
polynomials of i;^^-* , . . . , i'*-"-' . Thirdly, it also yields a nice base change matrix Y such 
that these order polynomials with respect to the matrix M can in fact be determined 
using Algorithm 2 for the vectors e^*"^"*"^-*, . . . , e('*"-i+^) since we have ordM(f = 
ordyMy-i(e*''-^+^) for 1 i fc. Note that p^^' = ordyMy-i (e*^^) = ov&m{v^'^^)- 
By Proposition 5.1, the vector u'-^-' is a uniformly distributed random element of 
V \ {0} if J = 1, or t/ \ . . ■ ^v^^-^^i) ^ if j > 1. Hence, by Propositions 6.3 

and 7.1, the probability that / after termination of Algorithm 5 is equal to /xm.fj 
is at least 1 — e. 

From the discussion at the beginning of Section 6, /iAf.Fj is the least common 

multiple of the k polynomials ordjvfl^'''^-'), • ■ • , ot:Am{v^''^), and hence if m = A; then 
/ = l-i'M,¥^- This also implies, since the initial value of / is lcm(p^^\ . . . ,p^''^), 
that the returned polynomial / divides fJ.M,¥^ and every irreducible factor of /i m.f™ 
divides /. In particular, if deg/ = n then we must have / = xm.fj = I^m,¥^- Thus 
if (True, /) is returned then / = /^m.fj is guaranteed. 

The number of elementary field operations needed follows from Propositions 5.1 
and 7.1 and summing. Note that, after the factorisations computed in line 2 of 
the algorithm, we neglect the forming of least common multiples and the products 
here, because all results from Algorithm 4 come already factorised into irreducible 
factors. We can thus compute the least common multiples by taking maximums of 
multiplicities. Hence the first displayed upper bound is proved. 

For the asymptotic complexity bound we have to consider the initial value of the 
number u, namely the least integer u such that X)*^;^ g""'^''^'^^ < e. The largest 
value of this sum occurs when all the qj have degree 1, and as there are then at most 
n such polynomials, J2j=i1~"'^^^''' ^ nq~". Thus u is at most the least integer 
such that ng"" ^ e, namely 

logn + log(£-i) 
logg 

and the value of u used in the algorithm is at most min{uO) k} < uq- By Proposi- 
tion 5.1, the asymptotic value of char(n, q) is less than 6n^ for n sufficiently large, 
(plus the cost k^n of making k random selections of vectors). By Proposition 7.1, 
the number of elementary field operations used for the computation of the u ^ uq 
order polynomials is at most 

V^^ * , n^ 2 ^ '^(w + 1) 2 , n 2 ^ f Uq + 37\ 2 

2^(2 + < 7 + 9ms« < uo ( — - — 1 n . 
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which, for sufficiently large n and fixed e, is less than ^Uqu'^ < n^. ■ 

8. Deterministic verification 

In this section we explain how the probabilistic result of our Monte Carlo algo- 
rithm can be verified deterministically. We begin by discussing cases that can be 
handled rather cheaply, before we present several general verification procedures, 
all of which, unfortunately, have a worst-case cost of 0{n^) field operations. 

All notation from previous sections remains in force. The first result follows 
immediately from Proposition 7.2. 

Proposition 8.1 (Cases, in which the result is already proven to be correct). // 
the output polynomial of Algorithm 5 is Xm.fj, then the output is (True, Xm.fj) 
and is correct. 

For the next result observe that, if Algorithm 5 is modified so that the for loop is 
run k times, then the resulting polynomial / is guaranteed to be the minimal poly- 
nomial, giving a deterministic algorithm with proven result. (Proof of correctness 
is given in the proof of Proposition 7.2.) 

Proposition 8.2 (Case of few random vectors chosen during comp. of Xm,fj)- 

If k ^ ^Jn, and the for loop in Algorithm 5 is run k tim,es, then the output 
polynomial is iim,f^ ■ The overall cost of this m,odification of Algorithm 5 is at most 

1 37 

char(n, q) + fact(n, q) + -n^ + — n^^^ 
elementary field operations. 

Proof: The only change to the complexity estimate is for the number of elementary 
field operations in the second last line of the proof of Proposition 7.2: 

+ 9)s? < M^sl + 9ksl < ^i^n^ + 9kn^ ^ + ^ln'/\ 

i=l 

The rest follows from Proposition 7.2. ■ 

For the case of large k, we may use the procedure suggested in Remark7.4 as 
a verification algorithm, at a cost of O(fc^n^) field operations. Two alternative 
verification procedures are given below. The first involving evaluation on vectors is 
given in Proposition 8.3, and the second using null space computations is given in 
Proposition 8.7. 

Proposition 8.3 (Verification by evaluation on vectors). 

For the output (Uncertain, /) of Algorithm 5 one can verify f = /Um,fj using 
at most dn{k — u){k + u + 4) elementary field operations where u and k are as in 

Proposition 7.2 and d = dcg /. 

Proof: The idea here is to check whether e^^^-^~^^^ f{YMY~^) is equal to zero, for 
u + 1 ^ i ^ k, hy direct evaluation using the techniques described in the proof 
of Propostion 7.1. Recall first that the result / comes in factorised form. Since 
g(si_i+i) YiQg in \Y! the arguments in the proof of Proposition 7.1 show that this 
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evaluation can be done using at most 3dsi + 2dX]r=i elementary field operations. 
Thus, an upper bound for the total cost for all these evaluations is 

k k i 

3d Si + 2d Sr- 

The first term is bounded above by 3dn{k — u). As to the second term, for 1 ^ j ^ 
u + 1, the value Sj occurs in this expression with coefficient 2d{k — u), while for 
M + 2 ^ J ^ fc, it occurs with coefficient 2d{k — j + 1). Thus the second term is at 
most 

2dn{u + l)(fc -u) + 2dn{k - u){k -u - l)/2 = dn{k - u){k + u + l). 

Adding this to the upper bound for the first term we get at most dn{k — u){k+u+4:) 
as claimed. ■ 

For the following discussion we need a lemma: 

Lemma 8.4 (Cost of evaluation of a polynomial at a matrix). Let M £ 

a matrix and f € ¥[x] a polynomial with degree d < n. Then the evaluation f{M) 

can be computed using at most 2dn^ elementary field operations. 

Proof: We take 2n^ elementary field operations as an upper bound for a matrix 
multiplication. The computation of the powers Af ^, M"^, .... M"* needs at most 2{d— 
l)n^ elementary field operations. The multiplication, for each i = 1, . . . ,d, oi 
by a coefficient of / and addition of the result to the already computed matrix (the 
sum of previous terms) needs another 2dn^ elementary field operations. Finally, 
the constant term of / has to be added along the diagonal, which is yet another n 
elementary field operations. Since rf + 1 < n < 2n^, this is altogether at most 2dnP 
as claimed. ■ 

Of course, this immediately implies: 

Corollary 8.5 (Small degree minimal polynomial). //deg/iM,Fj < n, then the 
output of Algorithm 5 can be verified by evaluation using at most 

2 • • deg a^m.fj 

elementary field operations. 

Remark 8.6. Note that using [1, Theorem 2] wc could lower the complexity in 
Lemma 8.4 to 0{Vdn^) provided we stored 0(\/rf) matrices in memory at the 
same time. However, since storing a matrix in F"^" needs O(n^) of memory, this 
approach would often become impractical before a concrete problem would become 
intractable because of time constraints. We use our estimates in Lemma 8.4 because 
of these practical considerations. However, in some practical situations, an improved 
polynomial evaluation algorithm using more memory may be suitable. 

We now present Algorithm 6 that can be run after Algorithm 5 to verify the 
correctness of the resulting polynomial deterministically. 

Proposition 8.7 (Deterministic minimal polynomial verification). 

// Algorithm 6 is called with candidate minimal polynomial Y\l=i from Algo- 
rithm 5, then it either returns True or a positive integer j. In the former case, 
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Algorithm 6 MinPoly verification 

Input: M e F"^", xm,v = 11!=! 9i* (factorised) , and a candidate Y\l^iq{' for 

I^M.F" (factorised), all data from Algorithm 5 

Output: True or a positive number j (see Proposition 8.7 for details). 

for i = 1 to t do 
if fi < e, then 

M' := q,(YMY-^y^ 
d := dimF(ker(M')) 
if < deg(gi) • Cj then 

return i 
end if 
end if 
end for 
return True 



I^M,¥^ = Y[l=i 1i' ' while in the latter case the multiplicity of qj in /Um.fj is greater 
than fj. The number of elementary field operations required by Algorithm 6 is at 
most 

t 

n'-$^(2deg9i + 2riog/il +1). 

Proof: Let Tj := deg^j for i = 1, . . . ,t. We again view F" as F[a:]-module as in 
Section 6 by letting x act as right multiplication by M. By [7, Theorem 3.12], it is 
isomorphic to a direct sum of primary cyclic F [a;] -modules 

t nii 
i=l i=l 

such that ovduiwi^j) = q-"^ with a ^ ^ /i,2 > • • • ^ fi,mi ^ 1 and J2T=i fhi 
f ; . Thus, for each i, qi occurs in /Um,fj with multiplicity fi^i, and so in particular 

fi ^ fi,i- The element q{' acts invertibly on all direct summands Wiij¥[x] with 
i' ^ i since qi is irreducible and every order polynomial of a non-zero vector in 
such a direct summand is a power of qi', by Lemma 4.11. For i' = i however, the 
dimension of the kernel of the action of q^^ on Wi_j¥[x\ is Vi ■ min{/i, /ij}. Thus the 
dimension of the kernel of the action of qf^ on the whole of F" is equal to 

rrii mi 

n ^ min{/j, fi^j] < Ti ^ fi^j = riei 

with equality if and only if fi > Since fi < fi^i, equality holds above if and 
only if fi is equal to the multiplicity of qi in /xm,fj- Therefore, Algorithm 6 
always returns the result as stated in the Proposition. 

As to the cost. Algorithm 6 evaluates qi at YMY~^ which needs at most 2rjn^ 
elementary field operations by Lemma 8.4. It then takes the result to the f!-^ power, 
which can be done by repeated squaring with at most 2n^ [log fi~\ elementary field 
operations, and finally computes the dimension of a null space, which can be done 
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with at most elementary field operations (compute a semi echelon basis of the 
row space of the matrix). Note that we are not using the sparseness of YMY~^ 
here. ■ 

Remark 8.8. The cost in Proposition 8.7 is much smaller than in many cases. 
One of the worst cases occurs when xm,F" contains lots of different factors of degree 
1 each occurring with multiplicity 3, and all the fi are equal to 2. Then Algorithm 6 
has to square about n/3 matrices and compute the null spaces of the results. This 
amounts to about 2n'*/3 elementary field operations, which is only about twice as 
fast as directly evaluating the minimal polynomial at M. Note that even in this 
case only about every sixth entry of YMY^^ is different from zero. 

Remark 8.9. As in each of our procedures there arc some simplifications wc could 
make in practice which do not reduce the worst case complexity estimates. For 
example, in Algorithm 6, there is no need to compute the kernel of qi{YMY^^)^' 
if the irreducible Qi does not divide any of the relative order polynomials p^^^ for 
u < j ^ k. 



9. Performance in practice 

In this section we give some experimental evidence concerning the performance 
of Algorithm 5 in comparison with that of algorithms currently implemented in the 
GAP library (see [4]). 

All computations were done on a machine with an Intel Core 2 Quad CPU Q6600 
running at 2.40 GHz with 8 GB of main memory and two times 4 MB of second 
level cache. 

We were unable to confirm that Magma [2] uses an algorithm based on the 
canonical forms algorithm of Alan Steel presented in [12] for computing mini- 
mal polynomials, although this is indicated in [12, Abstract] and on the web (see 
http : //magma .maths .usyd . edu . au/magma/htmlhelp/text347 . htm). 

Our colleague Colva Roney-Dougal kindly ran the Baby Monster example matrix 
M2 on Magma and the resulting times were roughly equivalent to the timing in the 
column "Lib" of Figure 2. suggesting that this is indeed the case. Since the minimal 
polynomial algorithm in the GAP library is also based on the algorithm in [12], we 
did not conduct extensive comparison tests of our algorithm on Magma. 

9.1. Guide to the test data 

The timing results are in Figure 2, all times are in seconds. The column marked 
"n" contains the dimension of the matrix, the column marked "g" the number 
of elements of the base field. The columns marked "Lib" and "AS" contain the 
times needed for one run of the minimal polynomial algorithm based on [12] as 
implemented in the GAP library, and as implemented (by the first author) in the 
GAP language, respectively. The column "MC" contains the total time for our 
Monte Carlo algorithm as presented in Algorithm 5. The next three columns marked 
"Spin" , "Fact" and "OrdP" contain the times for the three phases of this algorithm 
respectively, namely the first phase to compute the characteristic polynomial via 
relative order polynomials, the second phase to factor all factors of the characteristic 
polynomial and count multiplicities, and the third phase to compute some absolute 
order polynomials to guess the minimal polynomial. Finally, the last column marked 
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"Ver." contains the time for the deterministic verification via Algorithm 6. The 
maximal error probability for our Monte Carlo algorithm was e = 1/100 for all 
runs. 

9.2. The test matrices 

Next, we describe the matrices Mi, . . . , Miq we used. 

(a) The matrices Mi and M{ were purely random matrices from ipioooxiooo 
with all entries chosen with uniform distribution from the; field F3. Such matrices 
are with very high probability cyclic, that is, their characteristic and minimal poly- 
nomials are equal. Usually, Algorithm 3 only has to pick very few random vectors 
for such matrices. The for loop of Algorithm 6 quickly checks whether the least 
common multiple of the relative order polynomials (which is the input candidate 
polynomial) already has degree n. It turned out that M{ was cyclic but not Mi, 
and this explains the big differences in the runtimes for these matrices. 

(b) The matrix M2 is one coming from actual applications. Namely, it is the 
matrix a + b + ab where the two matrices a, 6 e ]f4370x4370 (jeggj-i^g ^j^g action of 
two standard generators of the Baby monster sporadic simple group on its smallest 
faithful simple module over F2. The matrices a and b were downloaded from the 
WWW Atlas of group representations (see [16]). The matrix a + b+ab is interesting 
because it is one of the algebra words that is used in the Meat Axe (see [11] and 
[6]) to compute composition series of modules and we could very well imagine using 
the minimal polynomial instead of the characteristic polynomial in some places in 
the MeatAxe. 

The reason why the standard algorithm for the minimal polynomial performed 
rather badly on this matrix is that its characteristic polynomial has irreducible 
factors of degrees 1, 1, 2, 4, 6, 88, 197, 854 and 934 with respective multiplicities 
2, 2277, 4, 1, 1, 1, 1, 1 and 1. Therefore the standard algorithm spins up large 
subspaces many times. 

(c) The matrices M3 - M7 were constructed in the following way: In the lan- 
guage of F [a;] -modules we chose the order polynomials of the generators of their pri- 
mary cyclic submodules, that is we chose the minimal polynomials on the primary 
cyclic submodules. For irreducible factors of degree one this amounts to choosing 
the sizes and numbers of the Jordan blocks occurring in the Jordan normal form 
of the matrix. After writing down the corresponding normal form of the matrix we 
conjugated it with a random element of the general linear group to get a dense 
matrix with the same normal form. 

For M3 G ]p600x600 chose one cyclic summand with minimal polynomial 
(x — Cs)"^"*^ plus 300 summands with minimal polynomial a; — C5. where (^5 G F5 
is a primitive root. This is a typical case in which our Monte Carlo algorithm and 
the deterministic verification both perform very well in comparison with older tech- 
niques. The reason for this is that the high dimensional cyclic subspace is spun up 
many times in the standard minimal polynomial algorithm as for the matrix M2. 

For M4 G ^1200x1200 chose 400 cyclic summands with minimal polynomial 
{x — Ca)^ plus 400 cyclic summands with minimal polynomial (x — Cs); where again 
Cs G F3 is a primitive root. In contrast with the matrix M3, our algorithms per- 
formed very well in this case but they were not much faster than the older tech- 
niques, since the standard algorithm run on M4 does not spin up many large cyclic 
subspaces. 
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Figure 2: Timings for minimal polynomial computation 
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For M5 € F251 wc chose 200 different linear factors x-a and for each added 
one cyclic space with minimal polynomial {x — a)^ and one with x — a. This example 
originally was a worst case scenario for our deterministic verification. However, since 
fc = 2 is quite small, a deterministic verification can be done relatively cheaply 
as described in Proposition 8.2 and Remark 7.4, even though the integer u in 
Algorithm 5 is only 1. The deterministic verification Algorithm 6 ran very slowly 
(more than 300 seconds) in this example. 

For Mq € F2^''^^^^''^ we chose the irreducible polynomial f{x)=x^+x'^ + l€ 
¥2[x] of degree 3 and added cyclic spaces with respective minimal polynomials Z^^", 

For My e ]p22"^220 chose an irreducible polynomial f{x) e Fio[a;] of degree 
10 and added cyclic spaces with respective minimal polynomials /^°, /, /, 
f and f. 

(d) The matrices Mg and A'/g were standard generators of GL(400, 17), conju- 
gated by the pseudo-random element Mio of the same group. Note that Mg had 
order 16 while Mg and Mio had very high order and were cyclic matrices. We chose 
these examples because they may be typical of difficult cases in an application of 
the minimal polynomial algorithm for computing the projective order of a matrix. 

Our algorithm very quickly discovered that the least common multiple of the 
relative order polynomials was already equal to the characteristic polynomial. 
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